On Supervised On-Line Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes

نویسندگان

چکیده

This note re-visits the rolling-horizon control approach to problem of Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from classical value-iteration approaches, we develop an asynchronous on-line algorithm based on policy iteration integrated a multi-policy improvement method switching. A sequence monotonically improving solutions forecast-horizon sub-MDP is generated by updating current solution only at currently visited state, building in effect for MDP over infinite horizon. Feedbacks “supervisors,” if available, can be also incorporated while updating. We focus convergence issue relation transition structure MDP. Either global optimal or local “locally-optimal” fixed-policy finite time achieved depending structure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

We consider infinite-horizon γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies π1, . . . , πk it implicitely generates until some iteration k. We provide performance bounds for non-stationary policies involving the last m generated policies that reduce the state-of-t...

متن کامل

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown, Smith, and Sun (2010). This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs a...

متن کامل

Average Optimality in Nonhomogeneous Infinite Horizon Markov Decision Processes

We consider a nonhomogeneous stochastic infinite horizon optimization problem whose objective is to minimize the overall average cost per-period of an infinite sequence of actions (average optimality). Optimal solutions to such problems will in general be non-stationary. Moreover, a solution which initially makes poor decisions, and then selects wisely thereafter, can be average optimal. Howeve...

متن کامل

Infinite Horizon Discounted Cost Problems

• We often approximate a large number of periods, even if the horizon is known and finite, by assuming an infinite number of periods, and hope that this assumption will simplify the solution. Indeed, even if the general theory becomes more involved, the solution obtained often is simpler and has important computational and conceptual advantages: in particular, the optimal policy is often statio...

متن کامل

Solving Infinite Horizon Discounted Markov Decision Process Problems for a Range of Discount Factors

In this paper we will assume the following framework. There is a finite state set I, with i E Z as its generic member, 1 < i < m. For each ie Z, there is a finite action set K(i), with k E K(i) as its generic member. For each i E Z, k E K(i), there is a transition probability, p(i, j, k), that if at a decision epoch the state is i E Z, and if action k E K(i) is taken, then the state will be Jo ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2023

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2023.3274791